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1. INTRODUCTION 

In the last decade, the widespread of the internet, communication, and mobile technology revolution 
have become noticeable. This revolution had a clear impact on the local and global transmission and 
exchange of information and news. Social media and online news websites started to boom and made it easy 
to get news for most people at any time and from anywhere in the world. The increased amount of time spent 
on social media sites is causing people to rely on these platforms for news to reduce or even dispense with 
traditional methods of accessing news and following up on information [1]. This development affects several 
positive aspects and negative aspects. The positives permitted a stable feeding of information, the free 
expression of opinion and bias to create an amiable conversation with minimal effort, speed of access and 
convenience. With social media and online websites, the world becomes much smaller. However, the 
reliability of this information considers a massive issue for the population. People always need to keep up to 
date with all life events occur around the world. The depending on social media for it, made the people 
believe all information and news which viral in tweets, links, or posts regardless of the credibility of sources 
and what is going on in front of them. All of this leads to the creation of fake news. 

The concept of fake news is not new. It has existed before the broad spreading of the internet as 
users publish bogus data could be a thrill, incomplete or partial to confound the users and block their 
capacities to separate what is valid from what is not to accomplish financial or political purposes. In addition, 
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today’s fake news maybe includes tampered photographs and videos, which increase the suspicion of 
individuals. 

With the rapidly spread of social networks, the danger of disastrous effects of the fast spread of 
bogus news over the online social networks (OSNs) is detonating. In the same situation, counterfeit news 
consistently leads to stir up struggle and issues. Over the most recent years, the spreading of phony news that 
published on the internet has increased and has accomplished the purpose of affecting social and political 
substances. As an example, this study [2] showed the critical effect of fake news regarding the 2016 US 
presidential elections. Publishing fake news is a global issue and should be considered as a crime with very 
severe punishments. Therefore, we cannot simply trust social media platforms to be a reliable and neutral 
source of information in general and news in specific. We should verify the sources. On the other side, the 
advancement of the Internet and technology alert the researchers to exploit the technology development in 
machine learning and artificial intelligence in their research and studies to find a system to predict and 
distinguish false news [3]. 

Fake news issue is not an easy issue to handle it. Researchers tried to provide a way to manage 
internet news and inform social media users whether what they are posting is fake or real. There are many 
proposed models to detect fake news using machine-learning algorithms to detect all fake news types. The 
main issue with these systems is that most of them they used English dataset. The existence of machine 
learning models that support the Arabic language is still very limited. This research focuses on the Arabic 
fake news detection issue and proposes different deep learning models to support this matter. 

The following section presents a brief background of news carrier types and the different data types 
that news’ carriers provide as output. Section 3 explains some related work to detect fake news in both 
languages: English and Arabic. In section 4, we propose our methodology that we adopted to define our 
models. Our results are discussed in section 5. Finally, the conclusion of the paper is displayed in section 6. 


2. OVERVIEW AND BACKGROUND 

News carriers have various kinds of media outlet that publish news topics and stories. The massive 
amount of news information that flowing through these carriers is heavily depends on the public population 
due to the need of this knowledge worldwide. The dependence on information posted on social media has 
increased as more people begin to feel more convenient with their social media accounts. In the United States 
of America, 6 out of 10 people get news from social media [4]. Another study in 2017 [5] appeared that 
around 67 percent of American grown-ups are depending on the OSNs, e.g., Twitter Inc., Facebook Inc. and 
Snapchat for news, contrasted with 62 percent in 2016. This section shows diverse data types for news carrier 
platforms to institute a basis for how to create and implement fake news. 


2.1. Types of news carrier platforms 
News carrier platforms are divided into two platforms: standalone websites and social media. 

a. Standalone website: a standalone website consists of any type of website that produces news, and there 
are three main types of these websites: i) popular news sites: their presence on the internet is vast, and 
they usually have a social media position with a large number of followers [6], ii) blog sites: these sites 
are smaller than popular news sites and contain user-generated content. Its content is usually given from 
a personal perspective. This type of sites is a high-risk area for information since it comes directly from 
a user’s bias rather than finding credibility in facts on a specific topic [6], and iii) media sites: these 
sites are not the same as social media. Media sites focus more on providing different media content 
from photography, news, videos, and tutorials. 

b. Social media: the easiest form of news travel is the sharing of information among people on social 
media. In 2017, 78% of users under-aged 50 years stated that they publish news on social websites 
including Twitter and Facebook [5]. This statistic displays the nature of social media and the spreading 
false information behavior. 


2.2. Types of news data 

Comprehension the types of various data platforms is important to understand the ways of getting 
the information from the internet. There are four main categories of data types: i) text: a part of semantics 
that spotlights the text as the type of correspondence. It does have punctuation and grammar to promote voice 
and tone [6]; ii) audio: audio is a sound. It does not show any type of visual to the user so, it acts differently 
from other media forms; iii) hyperlink: it is a way to link the user between two connected pages. Usually, a 
hyperlink is connected to a host website from ads, and in fake news is often the source of how ‘clickbait’ 
transactions work; and iv) multimedia: multimedia means the collection of video, graphics, and audios. 


Int J Elec & Comp Eng, Vol. 12, No. 4, August 2022: 3951-3959 


Int J Elec & Comp Eng ISSN: 2088-8708 O 3953 


3. LITERATURE REVIEW 

Recent related works examined the widespread of fake news across social websites and proposed 
models to detect fake news, which can be defined as the forecasting of the ways of news articles to be 
designedly swindling [4]. English fake news detection had the largest share of studies, unlike Arabic fake 
news detection, which is still very limited. In this section, we show some of these researches on both Arabic 
and English datasets. 

Figueira and Oliveira [3] presented two possible methods to detect fake news, which is totally 
opposite using algorithms and human involvement. The first one is based on the users to flag the fake news 
by fact-checkers from media communities such as Snopes.com. In addition, the second one is to use models 
and algorithms to check the validity of the information origins and determine fake content. In their views, 
this approach does not gain significant solidity to identify which information is false or not, properly. The 
authors proposed that the source must be tracked and assigned a score value due to periodic assessments done 
by a trustful third-party system. The authors concluded that it is potential to recognize objective elements -the 
facts- in social media websites posts, which can help in fighting fake news in addition to the algorithms such 
as: machine learning, text mining, and the necessary hardware to access big data for training the algorithms. 

Borges et al. [7] developed a deep learning model to address the stance detection challenge, 
influencing max pooling together with bidirectional recurrent neural networks (RNNs) and neural techniques 
to build representations from the body and from the headlines of news articles and consolidating these 
representations with external correspondence features. Mahir et al. [8] proposed an approach to analyzing 
fake news messages from social media posts, by finding how to expect accuracy assessment. Afterward, they 
compared five algorithms of machine learning to explain the efficiency of the classification execution on the 
dataset. The experimental result showed that naive Bayes and support vector machine (SVM) classifier 
outperforms the other algorithms. Nair et al. [9] presented some of the content types for fake news and 
algorithms used to detect and identify them. The major problem of fake news detection research is the lack of 
existence of fake news data set in different fields. This will affect the performance of the model. Therefore, 
this paper found a general algorithm to detect fake news. 

Ahmed et al. [10] presented a model for detecting fake news using n-gram analyses and six different 
supervised classifications. For comparison between techniques, they used term frequency (TF) and term 
frequency-inverted document frequency (TF-IDF) as feature extraction with different sizes of the n-gram 
from unigram to four-gram. In their study, they compared six various supervised machine learning 
techniques. They collected a new dataset from real sources and concentrated on the political news sector. 
They collected real news articles from Reuters.com and Kaggle.com fake news datasets. They also tested 
their algorithm on generally available datasets Horne and Adali [11]. Upon their results, the experimental 
evaluations showed that the Lagrangian SVM classifier and TF-IDF feature extraction achieved the highest 
accuracy of 92%. Qawasmeh et al. [12] using different deep learning models to detect fake news, they 
applied their models on the FNC-1 dataset, which is a headline-article English dataset. The proposed model is 
a bidirectional long short term memory (LSTM) concatenated model and multi-head LSTM model, with 
accuracy 85.3% and 82.9% respectively. 

Vedova et al. [13] proposed a machine-learning model to detect fake news. This model combines 
social context and news content features depending on social interactions like several shares of Twitter. The 
authors claimed that their algorithm outperforms other existing algorithms by 4.8%. In addition, they 
integrated their method to the chatbot of Facebook Messenger and obtained an accuracy of 81.7% by 
checking the validity of it with a real application. 

Since identifying the rumors at the publishing time is a very important aspect, Alkhair ef al. [14] 
have proposed a novel scheme that collects, analyses, and classifies Arabic fake news from YouTube. To 
collect their data, they used YouTube API, which retrieves all the videos with some certain criteria. They 
used three different classifiers to distinguish between the rumor’s comments and the non-rumors. They found 
that the result depends on multiple features as the rumor topic and the used classifier. Rangel et al. [15] is 
trying to recognize the misleading messages that were written to seem authentic while not in Twitter 
messages and news headlines. The collected data sets were created in [16]. 

The prevalence of fake news is a social case that is widespread at the social level through social 
media websites such as Facebook and Twitter and between people. Fake news that we are attentive in is one 
of many types of fraud in social websites, but it is a more important one as it is created with unfair intention 
to delude people [17]. Girgis et al. [17] construct a model that can forecast if some news is fake or not based 
only on its content, that way oncoming the problem from a deep learning view by RNN technique models 
including LSTMs, vanilla, and GRU. Kaur et al. [18] proposed a novel multi-layered voting ensemble model. 
The model tested using twelve classifiers on three datasets. 

Bauskar et al. [19] construct a machine learning approach depends on natural language processing 
(NLP) techniques for fake news’ detection by using both social features of news and content-based features. 
The proposed model achieved F1 score of 90.33% with an average accuracy of 90.62%. In study [20], a new 
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fake news detection model was presented, called fake news detection model using grammatical 
transformation on deep neural network. Aldwin and Alwahedi [21] proposed an efficient approach to permit 
users to install a tool into their personal devices and use it to detect possible click baits. The major results 
performed to evaluate the method’s ability to achieve its desired goal showed high performance in 
recognizing potential sources of fake news. In study [22], Torabi and Taboada were conducted a modeling 
trial to demonstrate the sources and gaps of imbalanced datasets for future guidance. In 2020, Ozbay and 
Alata [23] applied twenty-three artificial intelligence models to discover fake news using the Document- 
Term Matrix and TF weighting method. 


4. METHOD 
4.1. Overview of dataset 

The aim of this study is to discuss the Arabic fake-news issue, recognize untrusted news and 
enhance the detection tools. The news headline plays a major role in the reader's attention and completes 
reading the news or not. Therefore, the dataset consists of a claim headline and article-bodyline. The dataset 
is created by a group of researchers in 2018 [24] and can be downloaded from [25]. The data was created 
about the war in Syria and related to the Middle East political issues. The whole data comprises 422 claims 
and 3,042 articles. Each claim can match different articles, and other features including ID, fact, URL, 
stance. The source data is provided as 422 JavaScript File. Table 1 illustrates the source data format with all 
the feature’s description [25]. The dataset stances labels were for each claim- article pair from fake news 
challenge (FNC) dataset [26], [27]. Table 2 illustrates the description of each stance. 


Table 1. The dataset features description 


Feature Description 
ID The claim ID 
Claim The claim textual content 
Fact The factuality label of the claim (true or false) 
Article The list of articles we retrieved for each claim using the Google Search API 
URL The Links to each article 
Stance The stance of each article to the claim (agree, disagree, discuss or unrelated) 


Rationale Line’s location in each article that contain the rationale for agreement or disagreement with the claim 


Table 2. The Dataset stances’ description 


Label Description 
Agree The article concurred with claim 
Disagree The article not concurred with claim 
Discuss The article and claim discussed the same idea. However, it is not congruent with each other 
Unrelated The article and claim discussed different idea 


4.2. Data preprocessing 

The source dataset is provided as JavaScript files. The dataset was converted to .csv format, which 
results 3,042 records. Table 3 shows a set of data records examples. Then, we apply some preprocessing to be 
ready for deep-learning models. Figure 1 illustrates the preprocessing steps of the dataset. 


4.2.1. Data cleaning 

First, we check all empty values and remove the records, which have any empty cell. After 
removing it, we got 3,037 records. In our study, we focused on headline article data. Therefore, we ignore all 
other features like ID, fact, and URL. Finally, we work on the headline-article dataset in .csv format, which 
comprises 3,037 records with three features including claim, article, and label. In addition, we remove all 
non-Arabic characters and numbers, punctuation, and stop-words. Our dataset is split into three sets: training, 
validation, and testing. The ratio of these sets is 60%, 20%, and 20% respectively. The testing dataset was 
isolated before the training step and it is only utilized for the model valuation. 


4.2.2. Feature extraction 

Because the neural networks do not accept a text as-is. Re-representing all texts into numerical form 
is a mandatory step. Table 4 illustrates a simple description of all of them. In our research, we choose a 
Word2Vec approach [27] with a previous training model called the Gensim model. 
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Table 3. Stance labels description 


Claim Article Label 
cle gaill ailj ya ila’ o) clay! ag alae paaa SE" Sota ge - Leys cod egg Atte Cael À Gracie jaia (5) gas Ga y8 mail agree 
a eet yg: SoS À ogy Atle Gaps À Gael) gaahi GIs GIS daai 2018 , 21 ply can) Alive ALL) go Ay laa AMS 5 - 
Madd co alea aged Sad 3 hol Gib å Gal "( So%tsy) Gal Loge od agg Aids Gand pf aiia jain 
An informed source said on Wednesday that Kurdish Syrian Kurdish faction detains suspected French militants in 
factions had arrested suspected French Islamist Syria - Mourinhos - Independent Mauritanian news agency 
militants in northern Syria, including a man Sunday, January 21, 2018 Syrian Kurdish faction detains 
previously convicted of running a jihadist recruitment suspected French militants in Syria Paris (Reuters) 
network in France. 
ae od Ane laal) stile o Ag) gll dae leall Gre ysl yume SE jariga- Ly pee cod gy Avie Gund i Gratin jaian os) ps ga dai“ unrelated 
ale gya gly LS oY y ABU 5 Ap) pull a gaal onic bli gos GS daai 2018 , 21 ly as) Aine Ati y go Ay LS Alls y - 
"ia leall ë plas Cand af Glalie (cl) im jl de Sal es dines P( JASI) Ga Loe cd agg Atte Guns i paiia jaia 
Syrian opposition officials said rebels in an enclave Syrian Kurdish faction detains suspected French militants in 
where the Syrian, Lebanese and Israeli borders Syria - Mourinhos - Independent Mauritanian news agency 
intersect are negotiating a deal with the government to Sunday, January 21, 2018 Syrian Kurdish faction detains 
move to other rebel-held areas. suspected French militants in Syria Paris (Reuters) 


(422 JavaScript files) 
.csv File with 3042 records 


Feature Selection (Headline, Article, and Label) 


Remove all records with empty cells 
.csv File with 3037 records 


Remove all non-Arabic characters/ punctuation / stop words 
Re-represent texts into numerical 


Data splitting (Training/Validation/Testing) 
Ready to enter Deep learning model 


Figure 1. Dataset preprocessing steps 


Table 4. Word embedding details 


Word Embedding Approach Each word is converted to a sequence of numbers; these numbers were considered as 
beginning weights. 
Word-2-vec Approach This approach is better than the previous one in terms of the total training model time. 


It is capable to utilize any already trained model as Gensim model. So, the weights 
enhancement occurs before the training model [27]. 
Genism pre-trained model This model is utilized for pre-training on the same utilized dataset. When we applied 
it to our dataset, which included 3037 records, we got 158910 unique words. 


4.3. Model architectures 
After running many experiments, we proposed two models including model | Arabic fake news 
headline-articles long short term memory (AFND-LSTM) and model 2 Arabic fake news headline-articles 
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convolutional neural network long short term memory (AFND-CNN-LSTM). In this section, we will 
illustrate both models in detail. 

Figure 2 provides model 1 (AFND-LSTM) and model 2 (AFND-CNN-LSTM). Both models are 
shown in Figures 2(a) and 2(b), respectively. In both models, before entering the claims and articles over the 
input layer, the merging step is applied to them. After that, the embedding layer is performed. As we 
mentioned above, the Gensim training model is utilized in our models. In model 1, the embedding layer result 
passed through LSTM hidden layers [28] with memory unit: 150 and dropout: 2, then we normalize the 
values which resulted from the hidden layers using batch normalization [29]. The output is passed through 
three dense layers, separated by one dropout layer (256, 128, 4 units) respectively. But in model 2, the 
embedding layer result is entered over two convolution neural network hidden layers (32 and 64 filters). In 
order to evade over-fitting in the model, the max-polling layer [30] should be applied after each convolution 
neural network hidden layer. After that, the result is entered over the LSTM hidden layer [28] with memory 
unit: 150 and layer_number: 5 followed by the flattening layer, which is mandatory. The next step was to 
feed the result over one dense layer (4 units). Finally, in both models, the last layer is performed using the 
soft-max activation function to yield labels classification. For the training process, our loss function is: "was 
sparse categorical cross-entropy" [31] and the optimizer is "Adam" [32]. 


Merge claim with article Merge claim with article 


Embedding layer (Gensim pre-trained word- 
embedding model) 


Embedding layer (Gensim pre-trained word- 
embedding model) 
LSTM (memory units: 150) 
Batch Normalization 


Dense-1: (Output: 265, Activation function: 
ReLU) 


CNNI: Conv1D (filters: 32, activation function; 
ReLU) + Maxpooling1D 


CNN2: Conv1D (filters: 64, activation function; 
ReLU) + Maxpooling1D 


LSTM (memory units: 150) 


sigmoid) 
Flatten Layer 
Dropout (.2) 
Flatten Layer 


Dense-3: (Output: 4, Activation function: Dense-1: (Output: 4, Activation function: 


Dense-2: (Output: 128, Activation function: 


Softmax) Softmax) 


(a) (b) 


Figure 2. Arabic fake news detection models (a) model 1 AFND-LSTM and (b) model 2 AFND-CNN-LSTM 


5. RESULTS 

The online python platform Google Colab [33] was utilized. Different hyper-parameters are 
included in our models, such as word embedding models, embedding dimension, loss, and optimizer 
functions. The values found in the models rely on our experiments. 

Many experiments were conducted; we proposed two models to distinguish Arabic fake news 
headline-articles issue. As we mentioned before, model 1 AFND-LSTM and model 2 AFND-CNN-LSTM. 
Model 2 yields a better accuracy. Table 5 shows a set of evaluation for both models. Figure 3 provides model 
accuracy and model loss. Figures 3(a) and 3(b) show accuracy and the loss function for training and 
validation sets over time for model 2. 
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Table 5. Evaluation statistics 


Model Accuracy Precision Recall _ Fl-Score 
Model-1: AFND-LSTM 68.2 % 0.7 0.0 0.0 
Model-2: AFND-CNN-LSTM 70 % 0.6 1.0 0.75 


0.7 


0.6 . Train 
> if Validation 
© 
5 i v 
U 0.5 . wn 
3 i 8 

0.4 f 

] Train 
0.3 d Validation 
2 4 6 8 10 12 14 16 
Epoch Epoch 
(a) (b) 


Figure 3. Model 2 accuracy and loss function during training (a) accuracy and (b) loss 


6. CONCLUSION 

Fake news has the objective to become widely among people. With the existence of social media 
platforms like Twitter and Facebook, it becomes easier for false information to widespread quickly. The fast 
accessibility and publicity are being misapplied by many people to disseminate fake news. Creating results in 
a negative influence on people and society. This paper presented a deep learning method to address Arabic 
stance detection problems to address the Arabic fake-news issue, improve automatic detection tools and 
recognize unreliable news. A specific dataset is used to support this idea. Therefore, it consists of claim 
headline and article bodyline. Then, we apply some pre-processing such as data cleaning and features 
extraction to be ready for deep-learning models through analysis of word embedding in which the main phase 
involves cleaning the data and splitting dataset into 60%, 20%, 20% for training, validation, and testing, 
respectively. 

The obtained results show that AFND-CNN-LSTM gives better accuracy than AFND-LSTM; it 
resulted in 70% accuracy in comparison with the first model that results in 68.2%. Which means that when 
we combine two models together, the result is improved in accuracy, precision, and recall. Despite the 
promised results, there are also potential thoughts for future work. One of the future works includes 
improving the accuracy for models by adding more layers or by combining the models with other classifiers. 
This paper has written with the prospect that this model helps in improving the detection of false news and 
makes readers conscious of misinformation, so they are less liable to spreading lies. 
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